General purpose

I’m creating Rmds for each of the datastreams used in this project. Because our deployments were a little complex, and both SWAP and Firesting datasets were somewhat cobbled together, it’s important to document as many details as possible as clearly as possible in order to streamline QC, clearly ID and explain any data decisions, and make the methods section easy to write.

SWAP deployment

We deployed a total of 15 SWAP sensor rods and 6 SWAP reference electrodes across the plots for TEMPEST 2 (June 2023). All three plots were outfitted in the same way: 1 nest of 5 SWAP rods with electrodes at 15, 25, 40, and 60 cm in an approximate circle. All rods were installed 10 cm too shallow, so actual measurement depths were at 5, 15, 30, and 50 cm. 5, 15, and 30 cm depths match TEROS, and 5 and 30 cm match Firesting endmembers. As a quick note for TEMPEST 3: we should deploy our DO sensors at 5, 15, 30, and 50 cm to match SWAP (if using swap). Each nest had two reference electrodes deployed across the circle from each other at ~ 10 cm depth. All sensors were controlled via dataloggers powered by isolated circuits (separate batteries), and all times are reported as EST (GMT - 5) as with all other datalogger datasets.

Because Control looks weird, let’s zoom in, I’m not quite sure why the data look the way they do, but when separate by reference electrode, data look fine

QC

QC Step 1: Picking a reference electrode

There are two reference electrodes in each plot, and they should theoretically be reading the same thing. Electrode measurements are useless without the reference electrode, so it’s up to us to figure out which reference electrode is most trustworthy for each site. In the example above, it’s clear that reference B (‘rb’) is seeing strong diurnal variability, and is not reading correctly prior to 6/6, while reference A is doing about what we’d expect for the Control plot. Let’s look at all plots together to determine which reference makes the most sense for each plot.

5 cm

15 cm

30 cm

50 cm

Based first on 50 cm, which is the least influenced by flooding (so treatment plots and control plot should be similar patterns), we can see that Control rb has different patterns than the other 5 reference-plot combos, so we should use ra in Control. However, it’s a little unclear visually how we should decide ra v rb for Estuarine and Freshwater plots. Let’s create some plots to compare:

If we assume that redox should generally fall along the 1:1 line, and that the excursions are diagnostic of less “real” data, it appears that ra generally does “better” than rb. Since we don’t have a real ground-truth, we’ll go with ra across the board, partially based on the plot above, partially based on simple cross-site consistency.

Decision 1: Use the ra reference

Our dataset currently looks like this, and there are a couple things left to address:

  1. early deployment equilibration artifacts that need to be trimmed off
  2. error zero values (hard to see, but they’re there…) that need to be removed
  3. conversion of raw redox values to a standard reference (Eh)
  4. how to average (mean v median) across different sensors for the same depth/plot

QC Step 2: Remove erroneous zeros

Let’s do the easy stuff first. Here’s how the time-series look with all the redox_mv = 0 values labeled as points:

Control

Freshwater

Estuarine

We will remove all values that are 0 and gapfill. Note that this will overwrite a couple values that just happen to be 0 (e.g., see Freshwater 5 and 15 cm), but since those values have sensible values on either side, this will not dramatically alter any of our results, particularly since this is prior to binning.

Control

Freshwater

Estuarine

Decision 2: Replace all zeros with gap-filled values

QC Step 3: Remove pre-equilibration chunks

Because we know these data aren’t trustworthy, and because they make it hard to see the structure of our good data, we’ll remove this first. The question is where to cut, so let’s take an expanded view:

One easy first step here is we deployed sensors at slightly different times. Since our goal is to compare things apples-to-apples (as much as possible), we can use Contro, which starts on 6/5 at 16:35 as our first cut.

That helps by removing the first initial spike in Freshwater. However, we also have issues with numerous sensors, particularly in Freshwater, that result in exaggerated values (>1000 mV). These appear to be sensor-specific, so we’ll tackle that next.

Decision 3: Drop datetimes <= “2023-06-05 16:35:00”

QC Step 4: Select sensors to keep

Now all the easy parts are done, and things are threatening to get subjective. I’m going to explore a couple solutions for cleaning these data in a reproducible, unbiased way. The reason we deploy 5 sensors at each depth for each plot is because 1) there is a lot of spatial heterogeneity to capture, but 2) electrodes are highly sensitive to where they’re placed. Our goal here is to characterize the average behavior or redox, and so if one sensor is behaving very differently from the others, we can toss it. However, if the sensors are all doing their own thing, we’ll need to establish common-sense rules or defend decisions with clear justifications. First, here’s a closer look at sensors by plot and depth:

Initial thoughts:

Control
  • 5cm: sensor #17 (green) is suspiciously high prior to 6/6 AM, and sensor #13 (gold) is suspiciously low relative to the other sensors.
  • 15cm: sensors generally look consistent
  • 30cm: sensor #7 (pink) might be suspiciously low, esp as we know this site likely did not experience large changes in redox. Sensor #3 (blue) is also (albeit less so) suspicious in the same way
  • 50cm: Sensor #8 (pink) is clearly different than the others. This is our clearest-cut candidate for removing.
Freshwater
  • 5cm: Sensor #13 (gold) is suspiciously high at the beginning, but then matches 17 closely.
  • 15cm: Sensor #14 is suspiciously high at the beginning
  • 30cm: This is a little confusing: the same patterns, but temporally lagged…
  • 50cm: All consistent
Estuarine
  • 5cm: Sensors #9 and #17 (red and green) are suspiciously high at the beginning
  • 15cm:
  • 30cm:
  • 50cm:

The more I look at these, the more concerned I get about over-cleaning, or biased cleaning. Since the responses are quite variable, I think maybe the only fair thing is to 1) remove the only clearly anomalous sensor compared to the other 4 in that plot/depth (Control 8) and 2) remove known issues associated with deployment (i.e., sensors reading high, that drop off precipitously at the start of the deployment: Control 17, Freshwater 13 and 14, and Estuarine 9 and 17).

This still leaves us with a decision point to clarify: when do we cut off the 5 sensors mentioned above? Let’s isolate them, and find a datetime threshold:

  • Control is good by 6/6 10:55
  • Freshwater are both good by 6/6 6:10
  • Estuarine are both good by 6/6 11:10

Let’s remove the one known bad sensor, and the bad starting data for the 5 above:

Decision 4: Drop Control sensor 8, and trim 5 sensors with longer equilibration times (detailed above)

Final steps

I believe this is the cleaned dataset, so only four steps remain, which are (in order):

  1. Convert to Eh
  2. Average (using mean: median jumps from sensor to sensor since we have 5 values at most points)
  3. Write out
  4. Plot final data

Unfortunately, there is one more major hurdle, which is Eh requires knowing temperature, which we only have from TEROS, and that requires some gapfilling (i.e., no soil temperature at 50 cm). We’ll plot those just for fun prior to joining datasets. Note that we make the assumption here that temperature at 50 cm = temperature at 30 cm. While that’s wrong, I don’t have a simple, defensible way to estimate that temperature differential.

## # A tibble: 21,628 × 5
## # Groups:   datetime, plot [5,407]
##    datetime            plot       depth_cm eh_mv datetime_est       
##    <dttm>              <chr>         <dbl> <dbl> <chr>              
##  1 2023-06-05 17:40:00 Control           5  609. 2023-06-05 17:40:00
##  2 2023-06-05 17:40:00 Control          15  707. 2023-06-05 17:40:00
##  3 2023-06-05 17:40:00 Control          30  738. 2023-06-05 17:40:00
##  4 2023-06-05 17:40:00 Control          50  789. 2023-06-05 17:40:00
##  5 2023-06-05 17:40:00 Estuarine         5  665. 2023-06-05 17:40:00
##  6 2023-06-05 17:40:00 Estuarine        15  711. 2023-06-05 17:40:00
##  7 2023-06-05 17:40:00 Estuarine        30  737. 2023-06-05 17:40:00
##  8 2023-06-05 17:40:00 Estuarine        50  757. 2023-06-05 17:40:00
##  9 2023-06-05 17:40:00 Freshwater        5  791. 2023-06-05 17:40:00
## 10 2023-06-05 17:40:00 Freshwater       15  645. 2023-06-05 17:40:00
## # ℹ 21,618 more rows

Final plots of data!

Control

Freshwater

Estuarine